Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 456727 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 7609 |
| Duplicate rows (%) | 1.7% |
| Total size in memory | 26.1 MiB |
| Average record size in memory | 60.0 B |
Variable types
| Numeric | 6 |
|---|---|
| Categorical | 2 |
| DateTime | 1 |
| Text | 1 |
| Dataset has 7609 (1.7%) duplicate rows | Duplicates |
CustID is highly overall correlated with ZipCode_Frequency | High correlation |
Month is highly overall correlated with Year | High correlation |
Year is highly overall correlated with Month | High correlation |
ZipCode_Frequency is highly overall correlated with CustID | High correlation |
Year is highly imbalanced (65.0%) | Imbalance |
playscount is highly skewed (γ1 = 29.65574724) | Skewed |
Reproduction
| Analysis started | 2024-01-26 00:36:42.984741 |
|---|---|
| Analysis finished | 2024-01-26 00:40:22.578039 |
| Duration | 3 minutes and 39.59 seconds |
| Software version | ydata-profiling vv4.6.4 |
| Download configuration | config.json |
CustID
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 5000 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2027.6046 |
| Minimum | 0 |
|---|---|
| Maximum | 4999 |
| Zeros | 397 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 86 |
| Q1 | 687 |
| median | 1801 |
| Q3 | 3251 |
| 95-th percentile | 4628 |
| Maximum | 4999 |
| Range | 4999 |
| Interquartile range (IQR) | 2564 |
Descriptive statistics
| Standard deviation | 1479.8317 |
|---|---|
| Coefficient of variation (CV) | 0.72984234 |
| Kurtosis | -1.1125144 |
| Mean | 2027.6046 |
| Median Absolute Deviation (MAD) | 1235 |
| Skewness | 0.3613793 |
| Sum | 9.2606177 × 108 |
| Variance | 2189901.8 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 397 | 0.1% |
| 1 | 389 | 0.1% |
| 2 | 366 | 0.1% |
| 3 | 355 | 0.1% |
| 4 | 353 | 0.1% |
| 5 | 350 | 0.1% |
| 6 | 346 | 0.1% |
| 9 | 341 | 0.1% |
| 7 | 338 | 0.1% |
| 8 | 327 | 0.1% |
| Other values (4990) | 453165 |
| Value | Count | Frequency (%) |
| 0 | 397 | |
| 1 | 389 | |
| 2 | 366 | |
| 3 | 355 | |
| 4 | 353 | |
| 5 | 350 | |
| 6 | 346 | |
| 7 | 338 | |
| 8 | 327 | |
| 9 | 341 |
| Value | Count | Frequency (%) |
| 4999 | 61 | |
| 4998 | 65 | |
| 4997 | 47 | |
| 4996 | 62 | |
| 4995 | 67 | |
| 4994 | 67 | |
| 4993 | 57 | |
| 4992 | 67 | |
| 4991 | 63 | |
| 4990 | 53 |
Gender
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.5 MiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 456727 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 273694 | |
| 1 | 183033 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 273694 | |
| 1 | 183033 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 273694 | |
| 1 | 183033 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 456727 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 273694 | |
| 1 | 183033 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 456727 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 273694 | |
| 1 | 183033 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 456727 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 273694 | |
| 1 | 183033 |
zip
Real number (ℝ)
| Distinct | 4695 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50276.704 |
| Minimum | 1002 |
|---|---|
| Maximum | 99347 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.5 MiB |
Quantile statistics
| Minimum | 1002 |
|---|---|
| 5-th percentile | 6787 |
| Q1 | 27502 |
| median | 49925 |
| Q3 | 73526 |
| 95-th percentile | 95006 |
| Maximum | 99347 |
| Range | 98345 |
| Interquartile range (IQR) | 46024 |
Descriptive statistics
| Standard deviation | 27547.412 |
|---|---|
| Coefficient of variation (CV) | 0.54791602 |
| Kurtosis | -1.1272469 |
| Mean | 50276.704 |
| Median Absolute Deviation (MAD) | 23124 |
| Skewness | 0.030376972 |
| Sum | 2.2962728 × 1010 |
| Variance | 7.5885989 × 108 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 72132 | 475 | 0.1% |
| 4042 | 452 | 0.1% |
| 17307 | 389 | 0.1% |
| 66216 | 366 | 0.1% |
| 5341 | 364 | 0.1% |
| 57445 | 361 | 0.1% |
| 50858 | 360 | 0.1% |
| 48915 | 359 | 0.1% |
| 36690 | 355 | 0.1% |
| 61377 | 353 | 0.1% |
| Other values (4685) | 452893 |
| Value | Count | Frequency (%) |
| 1002 | 154 | |
| 1037 | 54 | < 0.1% |
| 1066 | 88 | < 0.1% |
| 1082 | 98 | |
| 1092 | 89 | < 0.1% |
| 1115 | 68 | < 0.1% |
| 1195 | 60 | < 0.1% |
| 1235 | 102 | |
| 1253 | 230 | |
| 1256 | 117 |
| Value | Count | Frequency (%) |
| 99347 | 82 | < 0.1% |
| 99346 | 81 | < 0.1% |
| 99330 | 53 | < 0.1% |
| 99256 | 241 | |
| 99223 | 66 | < 0.1% |
| 99206 | 61 | < 0.1% |
| 99176 | 89 | < 0.1% |
| 99173 | 67 | < 0.1% |
| 99158 | 120 | |
| 99153 | 198 |
SignDate
Date
| Distinct | 455 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.5 MiB |
| Minimum | 2011-05-10 00:00:00 |
|---|---|
| Maximum | 2013-07-31 00:00:00 |
Histogram with fixed size bins (bins=50)
Artist
Text
| Distinct | 357 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.5 MiB |
Length
| Max length | 37 |
|---|---|
| Median length | 29 |
| Mean length | 11.188966 |
| Min length | 3 |
Characters and Unicode
| Total characters | 5110303 |
|---|---|
| Distinct characters | 42 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | blue and gold |
|---|---|
| 2nd row | girls |
| 3rd row | i love you |
| 4th row | one scotch |
| 5th row | train |
| Value | Count | Frequency (%) |
| the | 33812 | 3.9% |
| band | 17168 | 2.0% |
| bob | 13654 | 1.6% |
| billy | 13200 | 1.5% |
| 12502 | 1.4% | |
| john | 12433 | 1.4% |
| beatles | 9914 | 1.1% |
| brothers | 9528 | 1.1% |
| alice | 9461 | 1.1% |
| david | 8200 | 0.9% |
| Other values (507) | 729887 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 588332 | 11.5% |
| 413032 | 8.1% | |
| a | 384554 | 7.5% |
| r | 349085 | 6.8% |
| o | 346097 | 6.8% |
| n | 308519 | 6.0% |
| l | 291886 | 5.7% |
| i | 263635 | 5.2% |
| t | 260320 | 5.1% |
| s | 259977 | 5.1% |
| Other values (32) | 1644866 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 4625772 | |
| Space Separator | 413032 | 8.1% |
| Other Punctuation | 41335 | 0.8% |
| Decimal Number | 23899 | 0.5% |
| Dash Punctuation | 6265 | 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 588332 | |
| a | 384554 | 8.3% |
| r | 349085 | 7.5% |
| o | 346097 | 7.5% |
| n | 308519 | 6.7% |
| l | 291886 | 6.3% |
| i | 263635 | 5.7% |
| t | 260320 | 5.6% |
| s | 259977 | 5.6% |
| c | 213366 | 4.6% |
| Other values (18) | 1360001 |
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 8767 | |
| 8 | 5788 | |
| 1 | 3849 | |
| 0 | 2996 | 12.5% |
| 2 | 1422 | 6.0% |
| 5 | 569 | 2.4% |
| 6 | 508 | 2.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 15278 | |
| & | 11811 | |
| / | 6110 | 14.8% |
| ' | 4265 | 10.3% |
| ? | 3871 | 9.4% |
Space Separator
| Value | Count | Frequency (%) |
| 413032 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 6265 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 4625772 | |
| Common | 484531 | 9.5% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 588332 | |
| a | 384554 | 8.3% |
| r | 349085 | 7.5% |
| o | 346097 | 7.5% |
| n | 308519 | 6.7% |
| l | 291886 | 6.3% |
| i | 263635 | 5.7% |
| t | 260320 | 5.6% |
| s | 259977 | 5.6% |
| c | 213366 | 4.6% |
| Other values (18) | 1360001 |
Common
| Value | Count | Frequency (%) |
| 413032 | ||
| . | 15278 | 3.2% |
| & | 11811 | 2.4% |
| 3 | 8767 | 1.8% |
| - | 6265 | 1.3% |
| / | 6110 | 1.3% |
| 8 | 5788 | 1.2% |
| ' | 4265 | 0.9% |
| ? | 3871 | 0.8% |
| 1 | 3849 | 0.8% |
| Other values (4) | 5495 | 1.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 5108550 | |
| None | 1753 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 588332 | 11.5% |
| 413032 | 8.1% | |
| a | 384554 | 7.5% |
| r | 349085 | 6.8% |
| o | 346097 | 6.8% |
| n | 308519 | 6.0% |
| l | 291886 | 5.7% |
| i | 263635 | 5.2% |
| t | 260320 | 5.1% |
| s | 259977 | 5.1% |
| Other values (30) | 1643113 |
None
| Value | Count | Frequency (%) |
| ö | 1384 | |
| ÿ | 369 | 21.0% |
playscount
Real number (ℝ)
SKEWED 
| Distinct | 126 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.8832344 |
| Minimum | 1 |
|---|---|
| Maximum | 449 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 5 |
| Maximum | 449 |
| Range | 448 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 3.156845 |
|---|---|
| Coefficient of variation (CV) | 1.676289 |
| Kurtosis | 2358.9595 |
| Mean | 1.8832344 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 29.655747 |
| Sum | 860124 |
| Variance | 9.9656706 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 307175 | |
| 2 | 80652 | 17.7% |
| 3 | 29610 | 6.5% |
| 4 | 13269 | 2.9% |
| 5 | 7089 | 1.6% |
| 6 | 4330 | 0.9% |
| 7 | 3074 | 0.7% |
| 8 | 2244 | 0.5% |
| 9 | 1702 | 0.4% |
| 10 | 1340 | 0.3% |
| Other values (116) | 6242 | 1.4% |
| Value | Count | Frequency (%) |
| 1 | 307175 | |
| 2 | 80652 | 17.7% |
| 3 | 29610 | 6.5% |
| 4 | 13269 | 2.9% |
| 5 | 7089 | 1.6% |
| 6 | 4330 | 0.9% |
| 7 | 3074 | 0.7% |
| 8 | 2244 | 0.5% |
| 9 | 1702 | 0.4% |
| 10 | 1340 | 0.3% |
| Value | Count | Frequency (%) |
| 449 | 1 | |
| 385 | 1 | |
| 321 | 1 | |
| 294 | 1 | |
| 242 | 1 | |
| 239 | 1 | |
| 234 | 1 | |
| 199 | 1 | |
| 190 | 1 | |
| 185 | 1 |
Year
Categorical
HIGH CORRELATION  IMBALANCE 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.5 MiB |
| 2013 | |
|---|---|
| 2012 | |
| 2011 | 1246 |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 1826908 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2013 |
|---|---|
| 2nd row | 2013 |
| 3rd row | 2013 |
| 4th row | 2013 |
| 5th row | 2013 |
Common Values
| Value | Count | Frequency (%) |
| 2013 | 400760 | |
| 2012 | 54721 | 12.0% |
| 2011 | 1246 | 0.3% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 2013 | 400760 | |
| 2012 | 54721 | 12.0% |
| 2011 | 1246 | 0.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 511448 | |
| 1 | 457973 | |
| 0 | 456727 | |
| 3 | 400760 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1826908 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 511448 | |
| 1 | 457973 | |
| 0 | 456727 | |
| 3 | 400760 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1826908 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 511448 | |
| 1 | 457973 | |
| 0 | 456727 | |
| 3 | 400760 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1826908 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 511448 | |
| 1 | 457973 | |
| 0 | 456727 | |
| 3 | 400760 |
Month
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.7306531 |
| Minimum | 1 |
|---|---|
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 4 |
| median | 6 |
| Q3 | 7 |
| 95-th percentile | 11 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.3793526 |
|---|---|
| Coefficient of variation (CV) | 0.41519746 |
| Kurtosis | 0.69417649 |
| Mean | 5.7306531 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.40153434 |
| Sum | 2617344 |
| Variance | 5.6613188 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=12)
| Value | Count | Frequency (%) |
| 7 | 126553 | |
| 6 | 91074 | |
| 5 | 67254 | |
| 4 | 48973 | 10.7% |
| 3 | 36089 | 7.9% |
| 2 | 21035 | 4.6% |
| 1 | 20452 | 4.5% |
| 12 | 16186 | 3.5% |
| 11 | 10964 | 2.4% |
| 10 | 8366 | 1.8% |
| Other values (2) | 9781 | 2.1% |
| Value | Count | Frequency (%) |
| 1 | 20452 | 4.5% |
| 2 | 21035 | 4.6% |
| 3 | 36089 | 7.9% |
| 4 | 48973 | 10.7% |
| 5 | 67254 | |
| 6 | 91074 | |
| 7 | 126553 | |
| 8 | 4447 | 1.0% |
| 9 | 5334 | 1.2% |
| 10 | 8366 | 1.8% |
| Value | Count | Frequency (%) |
| 12 | 16186 | 3.5% |
| 11 | 10964 | 2.4% |
| 10 | 8366 | 1.8% |
| 9 | 5334 | 1.2% |
| 8 | 4447 | 1.0% |
| 7 | 126553 | |
| 6 | 91074 | |
| 5 | 67254 | |
| 4 | 48973 | 10.7% |
| 3 | 36089 | 7.9% |
Day
Real number (ℝ)
| Distinct | 31 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16.31602 |
| Minimum | 1 |
|---|---|
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.7 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 9 |
| median | 17 |
| Q3 | 24 |
| 95-th percentile | 30 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 8.8165692 |
|---|---|
| Coefficient of variation (CV) | 0.54036273 |
| Kurtosis | -1.1997137 |
| Mean | 16.31602 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | -0.078498956 |
| Sum | 7451967 |
| Variance | 77.731892 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=31)
| Value | Count | Frequency (%) |
| 23 | 18066 | 4.0% |
| 27 | 18038 | 3.9% |
| 26 | 17609 | 3.9% |
| 19 | 16514 | 3.6% |
| 13 | 16088 | 3.5% |
| 20 | 15770 | 3.5% |
| 30 | 15616 | 3.4% |
| 18 | 15448 | 3.4% |
| 16 | 15410 | 3.4% |
| 10 | 15393 | 3.4% |
| Other values (21) | 292775 |
| Value | Count | Frequency (%) |
| 1 | 12276 | |
| 2 | 14789 | |
| 3 | 14528 | |
| 4 | 14841 | |
| 5 | 12517 | |
| 6 | 13837 | |
| 7 | 14187 | |
| 8 | 13195 | |
| 9 | 14733 | |
| 10 | 15393 |
| Value | Count | Frequency (%) |
| 31 | 9560 | |
| 30 | 15616 | |
| 29 | 14916 | |
| 28 | 15354 | |
| 27 | 18038 | |
| 26 | 17609 | |
| 25 | 15240 | |
| 24 | 15257 | |
| 23 | 18066 | |
| 22 | 15287 |
ZipCode_Frequency
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 255 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 119.2205 |
| Minimum | 45 |
|---|---|
| Maximum | 475 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.5 MiB |
Quantile statistics
| Minimum | 45 |
|---|---|
| 5-th percentile | 60 |
| Q1 | 74 |
| median | 96 |
| Q3 | 145 |
| 95-th percentile | 259 |
| Maximum | 475 |
| Range | 430 |
| Interquartile range (IQR) | 71 |
Descriptive statistics
| Standard deviation | 63.68349 |
|---|---|
| Coefficient of variation (CV) | 0.5341656 |
| Kurtosis | 3.4478556 |
| Mean | 119.2205 |
| Median Absolute Deviation (MAD) | 27 |
| Skewness | 1.7399225 |
| Sum | 54451221 |
| Variance | 4055.5869 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 70 | 8470 | 1.9% |
| 68 | 8432 | 1.8% |
| 71 | 7810 | 1.7% |
| 69 | 7797 | 1.7% |
| 75 | 7425 | 1.6% |
| 65 | 7345 | 1.6% |
| 73 | 7227 | 1.6% |
| 72 | 6984 | 1.5% |
| 66 | 6864 | 1.5% |
| 80 | 6720 | 1.5% |
| Other values (245) | 381653 |
| Value | Count | Frequency (%) |
| 45 | 45 | < 0.1% |
| 46 | 92 | < 0.1% |
| 47 | 188 | < 0.1% |
| 48 | 144 | < 0.1% |
| 49 | 294 | 0.1% |
| 50 | 700 | |
| 51 | 561 | 0.1% |
| 52 | 832 | |
| 53 | 1325 | |
| 54 | 1512 |
| Value | Count | Frequency (%) |
| 475 | 475 | |
| 452 | 452 | |
| 389 | 389 | |
| 366 | 366 | |
| 364 | 364 | |
| 361 | 361 | |
| 360 | 360 | |
| 359 | 359 | |
| 355 | 355 | |
| 353 | 353 |
| CustID | Day | Gender | Month | Year | ZipCode_Frequency | playscount | zip | |
|---|---|---|---|---|---|---|---|---|
| CustID | 1.000 | 0.024 | 0.036 | 0.002 | 0.045 | -0.830 | -0.205 | -0.012 |
| Day | 0.024 | 1.000 | 0.030 | 0.016 | 0.043 | -0.022 | -0.009 | -0.030 |
| Gender | 0.036 | 0.030 | 1.000 | 0.001 | 0.015 | -0.008 | -0.005 | 0.017 |
| Month | 0.002 | 0.016 | 0.001 | 1.000 | 0.632 | 0.000 | -0.000 | -0.005 |
| Year | 0.045 | 0.043 | 0.015 | 0.632 | 1.000 | -0.017 | -0.005 | 0.032 |
| ZipCode_Frequency | -0.830 | -0.022 | -0.008 | 0.000 | -0.017 | 1.000 | 0.175 | 0.025 |
| playscount | -0.205 | -0.009 | -0.005 | -0.000 | -0.005 | 0.175 | 1.000 | 0.001 |
| zip | -0.012 | -0.030 | 0.017 | -0.005 | 0.032 | 0.025 | 0.001 | 1.000 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
| CustID | Gender | zip | SignDate | Artist | playscount | Year | Month | Day | ZipCode_Frequency | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 72132 | 2013-06-04 | blue and gold | 11 | 2013 | 6 | 4 | 475 |
| 1 | 0 | 0 | 72132 | 2013-06-04 | girls | 2 | 2013 | 6 | 4 | 475 |
| 2 | 0 | 0 | 72132 | 2013-06-04 | i love you | 3 | 2013 | 6 | 4 | 475 |
| 3 | 0 | 0 | 72132 | 2013-06-04 | one scotch | 4 | 2013 | 6 | 4 | 475 |
| 4 | 0 | 0 | 72132 | 2013-06-04 | train | 17 | 2013 | 6 | 4 | 475 |
| 5 | 0 | 0 | 72132 | 2013-06-04 | .38 special | 242 | 2013 | 6 | 4 | 475 |
| 6 | 0 | 0 | 72132 | 2013-06-04 | 10cc | 45 | 2013 | 6 | 4 | 475 |
| 7 | 0 | 0 | 72132 | 2013-06-04 | 3 doors down | 70 | 2013 | 6 | 4 | 475 |
| 8 | 0 | 0 | 72132 | 2013-06-04 | ac/dc | 449 | 2013 | 6 | 4 | 475 |
| 9 | 0 | 0 | 72132 | 2013-06-04 | aerosmith | 12 | 2013 | 6 | 4 | 475 |
| CustID | Gender | zip | SignDate | Artist | playscount | Year | Month | Day | ZipCode_Frequency | |
|---|---|---|---|---|---|---|---|---|---|---|
| 456717 | 4999 | 1 | 33662 | 2012-11-10 | night ranger | 1 | 2012 | 11 | 10 | 152 |
| 456718 | 4999 | 1 | 33662 | 2012-11-10 | pat travers | 1 | 2012 | 11 | 10 | 152 |
| 456719 | 4999 | 1 | 33662 | 2012-11-10 | paul mccartney | 2 | 2012 | 11 | 10 | 152 |
| 456720 | 4999 | 1 | 33662 | 2012-11-10 | peter gabriel | 1 | 2012 | 11 | 10 | 152 |
| 456721 | 4999 | 1 | 33662 | 2012-11-10 | pink floyd | 2 | 2012 | 11 | 10 | 152 |
| 456722 | 4999 | 1 | 33662 | 2012-11-10 | police | 1 | 2012 | 11 | 10 | 152 |
| 456723 | 4999 | 1 | 33662 | 2012-11-10 | queens of the stone age | 1 | 2012 | 11 | 10 | 152 |
| 456724 | 4999 | 1 | 33662 | 2012-11-10 | queensryche | 1 | 2012 | 11 | 10 | 152 |
| 456725 | 4999 | 1 | 33662 | 2012-11-10 | the guess who | 1 | 2012 | 11 | 10 | 152 |
| 456726 | 4999 | 1 | 33662 | 2012-11-10 | tom petty | 1 | 2012 | 11 | 10 | 152 |
Most frequently occurring
| CustID | Gender | zip | SignDate | Artist | playscount | Year | Month | Day | ZipCode_Frequency | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 17307 | 2013-07-27 | crosby | 6 | 2013 | 7 | 27 | 389 | 3 |
| 118 | 34 | 0 | 83717 | 2013-06-11 | crosby | 1 | 2013 | 6 | 11 | 251 | 3 |
| 183 | 53 | 0 | 40372 | 2013-05-02 | crosby | 1 | 2013 | 5 | 2 | 236 | 3 |
| 282 | 85 | 1 | 72359 | 2013-06-27 | paul mccartney | 1 | 2013 | 6 | 27 | 219 | 3 |
| 406 | 132 | 0 | 67749 | 2013-06-18 | crosby | 1 | 2013 | 6 | 18 | 303 | 3 |
| 430 | 138 | 1 | 8073 | 2013-05-20 | crosby | 1 | 2013 | 5 | 20 | 265 | 3 |
| 719 | 248 | 0 | 48870 | 2013-06-05 | crosby | 1 | 2013 | 6 | 5 | 189 | 3 |
| 1241 | 474 | 1 | 18651 | 2013-07-22 | paul mccartney | 1 | 2013 | 7 | 22 | 137 | 3 |
| 1553 | 621 | 0 | 63785 | 2013-05-30 | paul mccartney | 1 | 2013 | 5 | 30 | 126 | 3 |
| 1656 | 666 | 0 | 25113 | 2013-06-18 | crosby | 1 | 2013 | 6 | 18 | 134 | 3 |